32,527 research outputs found

    Efficient resources assignment schemes for clustered multithreaded processors

    Get PDF
    New feature sizes provide larger number of transistors per chip that architects could use in order to further exploit instruction level parallelism. However, these technologies bring also new challenges that complicate conventional monolithic processor designs. On the one hand, exploiting instruction level parallelism is leading us to diminishing returns and therefore exploiting other sources of parallelism like thread level parallelism is needed in order to keep raising performance with a reasonable hardware complexity. On the other hand, clustering architectures have been widely studied in order to reduce the inherent complexity of current monolithic processors. This paper studies the synergies and trade-offs between two concepts, clustering and simultaneous multithreading (SMT), in order to understand the reasons why conventional SMT resource assignment schemes are not so effective in clustered processors. These trade-offs are used to propose a novel resource assignment scheme that gets and average speed up of 17.6% versus Icount improving fairness in 24%.Peer ReviewedPostprint (published version

    Frontend frequency-voltage adaptation for optimal energy-delay/sup 2/

    Get PDF
    In this paper, we present a clustered, multiple-clock domain (CMCD) microarchitecture that combines the benefits of both clustering and globally asynchronous locally synchronous (GALS) designs. We also present a mechanism for dynamically adapting the frequency and voltage of the frontend of the CMCD with the goal to optimize the energy-delay/sup 2/ product (ED2P). Our mechanism has minimal hardware cost, is entirely self-adjustable, does not depend on any thresholds, and achieves results close to optimal. We evaluate it on 16 SPEC 2000 applications and report 17.5% ED2P reduction on average (80% of the upper bound).Peer ReviewedPostprint (published version

    Control speculation for energy-efficient next-generation superscalar processors

    Get PDF
    Conventional front-end designs attempt to maximize the number of "in-flight" instructions in the pipeline. However, branch mispredictions cause the processor to fetch useless instructions that are eventually squashed, increasing front-end energy and issue queue utilization and, thus, wasting around 30 percent of the power dissipated by a processor. Furthermore, processor design trends lead to increasing clock frequencies by lengthening the pipeline, which puts more pressure on the branch prediction engine since branches take longer to be resolved. As next-generation high-performance processors become deeply pipelined, the amount of wasted energy due to misspeculated instructions will go up. The aim of this work is to reduce the energy consumption of misspeculated instructions. We propose selective throttling, which triggers different power-aware techniques (fetch throttling, decode throttling, or disabling the selection logic) depending on the branch prediction confidence level. Results show that combining fetch-bandwidth reduction along with select-logic disabling provides the best performance in terms of overall energy reduction and energy-delay product improvement (14 percent and 10 percent, respectively, for a processor with a 22-stage pipeline and 16 percent and 13 percent, respectively, for a processor with a 42-stage pipeline).Peer ReviewedPostprint (published version

    The phonological status of English oral stops after tautosyllabic /s/ : evidence from speakers' classificatory behaviour

    Get PDF
    The classification of oral stops after tautosyllabic /s/ in English is an old phonological problem to which different solutions have been proposed. In an attempt to provide experimental evidence on the classification of oral bilabial stops after tautosyllabic /s/ by native speakers of English, a concept formation experiment was conducted. The results showed that out of the four main phonological theoretical views on the classification of oral stops after tautosyllabic /s/, the solution which treats those speech segments as allophones of the phonemes /p, t, k/ is the most plausible from the point of view of language users' classificatory behaviour

    Virtual-physical registers

    Get PDF
    A novel dynamic register renaming approach is proposed in this work. The key idea of the novel scheme is to delay the allocation of physical registers until a late stage in the pipeline, instead of doing it in the decode stage as conventional schemes do. In this way, the register pressure is reduced and the processor can exploit more instruction-level parallelism. Delaying the allocation of physical registers require some additional artifact to keep track of dependences. This is achieved by introducing the concept of virtual-physical registers, which do not require any storage location and are used to identify dependences among instructions that have not yet allocated a register to its destination operand. Two alternative allocation strategies have been investigated that differ in the stage where physical registers are allocated: issue or write-back. The experimental evaluation has confirmed the higher performance of the latter alternative. We have performed all evaluation of the novel scheme through a detailed simulation of a dynamically scheduled processor. The results show a significant improvement (e.g., 19% increase in IPC for a machine with 64 physical registers in each file) when compared with the traditional register renaming approach.Peer ReviewedPostprint (published version

    Using MCD-DVS for dynamic thermal management performance improvement

    Get PDF
    With chip temperature being a major hurdle in microprocessor design, techniques to recover the performance loss due to thermal emergency mechanisms are crucial in order to sustain performance growth. Many techniques for power reduction in the past and some on thermal management more recently have contributed to alleviate this problem. Probably the most important thermal control technique is dynamic voltage and frequency scaling (DVS) which allows for almost cubic reduction in power with worst-case performance penalty only linear. So far, DVS techniques for temperature control have been studied at the chip level. Finer grain DVS is feasible if a globally-asynchronous locally-synchronous (GALS) design style is employed. GALS, also known as multiple-clock domain (MCD), allows for an independent voltage and frequency control for each one of the clock domains that are part of the chip. There are several studies on DVS for GALS that aim to improve energy and power efficiency but not temperature. This paper proposes and analyses the usage of DVS at the domain level to control temperature in a clustered MCD microarchitecture with the goal of improving the performance of applications that do not meet the thermal constraints imposed by the designers.Peer ReviewedPostprint (published version

    The string tension from smeared Wilson loops at large N

    Get PDF
    We present the results of a high statistics analysis of smeared Wilson loops in 4 dimensional SU(N) Yang-Mills theory for various values of N. The data is used to analyze the behaviour of smeared Creutz ratios, extracting from them the value of the string tension and other asymptotic parameters. A scaling analysis allows us to extrapolate to the continuum limit for N=3,5,6 and 8. The results are consistent with a 1/N21/N^2 approach towards the large N limit. The same analysis is done for the TEK model (one-point lattice) for N=841 and a non-minimal symmetric twist with flux of k=9k=9. The results match perfectly with the extrapolated large N values, confirming the validity of the reduction idea for this range of parameters.Comment: Enlarged revised version with 2 tables and 3 figure

    The string tension for Large N gauge theory from smeared Wilson loops

    Full text link
    Using smeared Creutz ratios we extract the string tension for SU(N) pure gauge theory and NN=3,4,5,6,8. We employ these results to extrapolate to large N. The same methodology is applied to the single-site Twisted Eguchi Kawai model. The corresponding string tension matches perfectly within errors with the extrapolated one, providing strong evidence in favour of the twisted reduction framework. Interesting results are also obtained on the behaviour of Creutz ratios for large sizes.Comment: 7 pages and 3 figures. Contribution to Lattice 2012 in Cairn
    • …
    corecore